Temporal correlations and neural spike train entropy 
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Sampling considerations limit the experimental conditions under which information theoretic 
analyses of neurophysiological data yield reliable results. We develop a procedure for computing the 
full temporal entropy and information of ensembles of neural spike trains, which performs reliably 
for limited samples of data. This approach also yields insight upon the role of correlations between 
spikes in temporal coding mechanisms. The method, when applied to recordings from complex cells 
of the monkey primary visual cortex, results in lower RMS error information estimates in comparison 
to a 'brute force' approach. PACS numbers: 87.19.Nn,87.19.La,89.70.+c,07.05.Kf 



Cells in the central nervous system communicate by 
means of stereotypical electrical pulses called action po- 
tentials, or spikes The Shannon information content 
of neural spike trains is fully described by the sequence of 
times of spike emission. In principle, the pattern of spike 
times provides a large capacity for conveying information 
beyond that due to the code commonly assumed by phys- 
iologists, the number of spikes fired ||^. Reliable quantifi- 
cation of this spike timing information is made difficult 
by undcrsampling problems that scale with the number 
of possible spike patterns, and thus up to exponentially 
with the precision of spike observation (see Fig. |]). While 
advances have been made in experimental preparations 
where extensive sampling may be undertaken |^-]^, our 
understanding of the temporal information properties of 
nerve cells from less accessible preparations such as the 
mammalian cerebral cortex is limited. 

Any direct estimate of the complete spike train in- 
formation is limited by sampling considerations to rel- 
atively small wordlengths, and therefore to the analysis 
of short time windows of data. However, it is possible to 
take advantage of this restriction itself to obtain estima- 
tors which have better sampling properties than a 'brute 
force' approach. In this Letter we present an approach 
based upon a Taylor series expansion of the entropy, to 
second order in the time window of observation . The 
analytical expression so derived allows the ensemble spike 
train entropy to be computed from limited data samples, 
and relates the entropy and information to the instanta- 
neous probability of spike occurrence and the temporal 
correlations between spikes. Comparison with other pro- 
cedures such as the 'brute force' approach |^,^ indicates 
that our analytical expression gives substantially better 
performance for data sizes of the order typically obtained 
from mammalian neurophysiology experiments, as well as 
providing insight into potential coding mechanisms. 

Consider a time period of duration T, associated with 
a dynamic or static sensory stimulus, during which the 
activity of C cells is observed. The neuronal population 
response to the stimulus is described by the collection 
of spike arrival times {tf}, tf being the time of the i-th 
spike emitted by the a-th neuron. The spike time is ob- 



served with finite precision At, and this bin width is used 
to digitise the spike train (Fig. 0). For a given discretisa- 
tion (temporal precision), the entropy of the spike train 
is a well defined quantity. The total entropy of the spike 
train ensemble is 



(1) 



where the summation is over all possible spike times 
within T and over all possible total spike counts from 
the population of cells. This entropy quantifies the to- 
tal variability of the spike train. Each different stimu- 
lus history (time course of characteristics within T) is 
denoted as s. The noise entropy, which quantifies the 
variability to repeated presentations of the same stimu- 
lus, is = {H{{tf}\s)) ^, where the angular brack- 
ets indicate the average over different stimuli, {A{s))^ = 
^j,g_5 P(s)A(s). The mutual information that the re- 
sponses convey about which stimulus history invoked the 
spike train is the difference between these two quantities. 






1 


























1 





1 


1 












-I I- 
At 



T=LAt 

FIG. 1. Digitising spike trains into binary 'words' with a 
given precision. A common experimental structure has A'^ re- 
peats for each separate stimulus (one stimulus shown). The 
spike emission times for each such 'trial' are binned with reso- 
lution At, as shown for the last raster. There are 2^ possible 
words when examining data from a time window of duration 
T. 

These entropies may be expanded as a Taylor series in 
the time window of measurement. 



H = THf 
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0{T^). 



(2) 



To compute the Taylor expansion, we made the following assumptions: (i) The time window is short enough, or 
the firing rate low enough, that there are few spikes per stimulus presentation, (ii) The entropy is analytic in T . 
(iii) Different trials are random realisations of the same process. We will use the bar notation for the average over 
trials at fixed stimulus, such that if ra{t] s) — 5t.t'^{s)^ the time-dependent instantaneous firing rate ra{t] s) is its 
average over experimental trials, (iv) Spikes are not locked to each other with infinite precision; in other words, the 
conditional probability of a spike occuring at time rj' given occurrence of a particular spike pattern scales for 
small At proportionally to At plus higher order terms, with no 0(1) terms: Pfr^lltf }; s) cx At-|- • ■ • for each possible 
spike pattern {t"}. The validity of these assumptions has been examined elsewhere [llO| . 

The probability of observing a pattern with k spikes can be expressed as a product of k probabilities of each of the 
spikes given the presence of others. Thus from (iv), the probability of this pattern is proportional to At*^, and the 
expansion is essentially in the total number of spikes emitted. This also implies that only the conditional probabilities 
between spike pairs are necessary for the 2nd order expansion. Parameterising the conditional probability between 
two spikes by the scaled correlation 7afc(t", tj; s) |pT| , we can now write down the probabilities required by Eq. 1. 

Denoting the no spikes event as and the joint occurrence of a spike from cell a at time t° and a spike from cell h 
at time t\ as tit\, the conditional response probabilities are, to second order: 



p(o|s) = 1 - ^ E^-(*?; ')^^ + ^ E E E^''(^?; ^) [i + la^itiA; ^)] At^ 

a=l tf ab tl 

C 

P{tl\s) = r,(t?; s)At - r,(t?; s) Y,Y.^b{tl; s) [l + labitlA: «)] At^ a = 1, • • • , C 

6=1 

P(<?t^|s) = r,(t?;s)rfc(t^;s)[l + 7,,(t?,t^;s)] At2 a = l,---,C, 5=1,---,C. (3) 

The unconditional response probabilities are simply p{{t°:}) = {p{{t'i}\s)) Inserting p({tf}) into Eq. Q and keeping 
only leading order terms yields for the first order total entropy 

= i;:? E E {raiti-sw)^ -EE (^-(^i; ^)^*)« i°g2 {Mtv. sW)s ■ (4) 
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Inserting p{{tf}\s) instead yields a similar expression for the first order noise entropy THf™^^, except with a single 
stimulus average (•)^ around the entire second term. Continuing the expansion, and noting that a factor of 1/2 is 
introduced to prevent overcounting of equivalent permutations, the additional terms up to second order are: 
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E E E (^-^(^^^ [1 + labitlA; S)] At')^ log, {ra{tf,^))s (5) 
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+ EEE(^°(^?;^)^''(^2;^)[i + 7a6(^?,4;^)] At^iog, ) . (6) 

ab t» tl \ ^ra{tf;s)ni4;s)[l + jabit1,4;s)]/ ^ 

The difference between the total and noise entropies gives nomenon provided in Q . Observe that the second order 

the expression for the mutual information detailed in . total entropy can be rewritten in a form which shows that 

it depends only upon the grand mean firing rates across 

It has recently been found that correlations, even if stimuh, and upon the correlation coefficient of the whole 

independent of the stimulus identity, can increase the in- gpike train, T{t°;,t)) (defined across all trials rather than 

formation present in a neural population This ap- ^^^^^ ^-^^^ ^ ^-^^^ stimulus as for 7(tf , s)). Thus, 

plies both to cross-correlations between the spike trains 

from different neurons and to auto-correlations in the rp2 

spike train from a single neuron ||l^. The equations de- ~2~^** ~ 2Tn2 E E E ^^a(^i ; s))s (^bA'^ s))^ (7) 
rived above add something to the explanation of this phe- ab t° 



X {Tahiti t^) - [1 + Tabitlt';)] ln[l + r„fc(t^, t^)]} 

It follows that the second order entropy is maximal when 
r = 0, and non-zero overall correlations in the spike 
trains (indicating statistical dependence) always decrease 
the total response entropy. 7(5) acts on the noise entropy 
as r does upon the total entropy - it can only decrease 
the conditional entropy. The effect of 7(5) on the total 
entropy is more complex, depending upon the correlation 
of the firing across stimuli. 7(5) can be chosen so as to in- 
crease the total entropy (and thus the information, with 
the noise entropy fixed), and this increase will be maxi- 
mal for the 7(s) which lead exactly to F = 0. Neuronal or 
spike time interaction may therefore eliminate or reduce 
the effect of statistical dependencies introduced by other 
covariations. 

The rate and correlation functions in practice must be 
estimated from a limited number of experimental trials, 
which leads to a bias in each of the entropy components. 
This bias was corrected for, as described in [|l3[; how- 
ever, the sampling advantage that will be described was 
observed both with this correction, without bias correc- 
tion, and with other bias correction approaches such as 
that used in 
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FIG. 2. Data-size dependence of noise entropy estimates 
for a VI complex cell. Time windows of 40ms (half a stimu- 
lus cycle) were broken into words of length 12 for the analysis. 
The upper panel, (i), shows entropy estimates prior to correc- 
tion for bias, normalised by the asymptotic (true) entropy. 
The dotted line indicates the 'brute force' sampling charac- 
teristics for a Poisson process with the same time-dependent 
firing rate. The lower panel, (ii), shows the bias-corrected ver- 
sions of these estimates, and in addition the Ma lower bound 
upon the entropy. The asymptotic entropy was obtained by 
extrapolating from the curves; the results agree to within 1%. 
Error bars were obtained by bootstrap resampling. 



To demonstrate its applicability, we applied the series 
entropy analysis to data recorded from the primary vi- 



sual cortex (VI) of anaesthetised macaque monkeys [ p^ . 
Fig. H examines, for a typical VI complex cell, the de- 
pendence of the accuracy of the noise entropy estimate 
upon the number of experimental trials utilised. It is the 
noise entropy which is most affected by sampling con- 
straints, so we shall concentrate upon this quantity here. 
The top panel shows the estimates before application 
of a bias removal procedure, using the series (our tech- 
nique) and 'brute force' (simple application of Eqn. |^) 
approaches. The entropies are expressed as a fraction 
of the asymptotic entropy obtained by polynomial ex- 
trapolation 1^ . Reliable extrapolation to the asymptotic 
entropy was possible because of the large amount of data 
that happened to be available for this cell (which was 
chosen with that in mind; more usually between 20 and 
100 trials were available). This allowed us to compare 
the performance of the methods on smaller subsets of 
the data against a known reference. The fact that series 
and brute-force estimators converged for this cell indi- 
cates that higher order correlations amongst spike times 
contributed little to the entropy. 

The better performance of the series approach can be 
understood by considering that (at second order) it re- 
quires sampling from only the first two moments of the 
probability distribution, whereas the 'brute force' ap- 
proach depends upon all moments. Higher moments have 
to be computed from events with lower and lower prob- 
ability, as shown in Eqn. 4; estimation of these lower 
probability events is more error-prone, and leads to the 
larger bias of the 'brute force' approach. 

Also shown in Fig. |^ is the Ma lower bound upon the 
entropy ]T^ , which has been proposed as a useful bound 
which is relatively insensitive to sampling problems 
The Ma bound is tight only when the probability distri- 
bution of words at fixed spike count is close to uniform. 
It can be seen that for the VI complex cell data, the Ma 
bound is not tight at all. To understand the behaviour 
of the Ma bound for short time windows, we calculated 
series terms. The Ma entropy already differs from the 
true entropy at first order: 



This coincides with Eqn. 5 only if there are no varia- 
tions of rate across time and cells. If there were higher 
frequency rate variations, or more cells with different re- 
sponse profiles, the Ma bound would be still less useful. 

Estimation quality depends upon not just sampling 
bias, but also variance; these can be summarised by the 
RMS error of the entropy estimate. We investigated the 
behaviour of the RMS error by fitting a Poisson model 
with matched time-dependent firing rate to the experi- 
mental data of Fig. I. This model, although yielding a 
5% lower noise entropy (because of correlations in the 



real data), predicted the 'brute force' sampling charac- 
teristics of Fig. H almost exactly. The model was used 
to generate a larger set of data (10,000 trials, or 160,000 
stimulus presentations in total) . This model yields worst- 
case sampling for the 'brute force' estimator; worst-case 
sampling for the series estimator would be achieved by 
even spread of probability throughout only the second 
order response space. The simulation serves to compare 
the estimators in a statistical regime similar to that of 
the typical cell of Fig. 0. 
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FIG. 3. RMS error scaling characteristics for wordlengths 
from 4 to 12 in the simulation. The true noise entropies were 
2.0, 3.5 and 4.7 bits respectively. 

Fig. H shows the scaling of the RMS error before bias 
correction with data-size in this simulation. Scaling is 
qualitatively similar (but with a sharper decrease) af- 
ter correction. The scaling behaviour resulting from the 
simulation predicts that with a 'brute force' approach, a 
RMS error of 2% of the entropy at a wordlength of 12 
would require around 1400 trials with, and greater than 
5000 trials without, application of the finite sampling cor- 
rection. The series estimator reduces these requirements 
to approximately 50 and 400 trials respectively. These 
figures are dependent upon data statistics, and should 
be checked on a case by case basis; however, the dimen- 
sionality reduction with the series expansion provides a 
general improvement in the quality of entropy estimates 
for short time windows. 

Some readers may wonder whether this new method 
amounts to computing the entropy with words with 
greater than 2 spikes thrown out. This is not the 
case: the proposed method considers pairwise interac- 
tions amongst all spikes in the word, no matter how many 
there are. It thus (unlike a truncated brute force ap- 



proach) obtains the ability to take into account almost 
all of the entropy of longer words, while retaining the 
sampling benefits of being a second order method. 

As neuroscience enters a quantitative phase, informa- 
tion theoretic techniques are being found useful for the 
analysis of data from physiological experiments. The 
methods developed here may broaden the scope of the 
study of neuronal information properties. In particular, 
they render feasible the information theoretic analysis 
of some recordings from anaesthetised and awake mam- 
malian cerebral cortices. 

SRS is supported by the HHMI, and SP by the Well- 
come Trust. 
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